The Evolution of Complex Networks: A New Framework 
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We introduce a new framework for the analysis of the dynamics of networks, based on randomly 
reinforced urn (RRU) processes, in which the weight of the edges is determined by a reinforcement 
mechanism. We rigorously explain the empirical evidence that in many real networks there is a subset 
of "dominant edges" that control a major share of the total weight of the network. Furthermore, 
we introduce a new statistical procedure to study the evolution of networks over time, assessing if a 
given instance of the nework is taken at its steady state or not. Our results are quite general, since 
they are not based on a particular probability distribution or functional form of the weights. We 
test our model in the context of the International Trade Network, showing the existence of a core 
of dominant links and determining its size. 
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A large number of real systems in different domains, 
such as physics m, economics @t3|' computer science 
y, social science [6|, transportation [7| and others, can 
be efficiently described by a network structure, where 
the nodes are the system entities and the links repre- 
sent the relations between them [8|. In comparison to 
that, relatively few models have been presented in order 
to explain the onset of scalc-invariance in statistical dis- 
tributions of degree and other topological properties (as 
betweenness, clustering and assortativity) . In this paper 
we present a new model of network growth and evolution 
based on the randomly reinforced urn (RRU) processes 
theory. The model maps the weights of a particular edge 
with the number of balls of a particular color which are 
added in the urn. Our model is particularly suitable for 
dense and weighted networks, a situation often problem- 
atic both for modeling and for randomization. Due to the 
analytical properties of this treatment, one can define a 
statistical procedure for investigating the dominance of 
one set of edges (colors) vis a vis the others. Importantly 
enough, our procedure allows to determine if a particular 
instance of a dynamical network is taken at the steady 
state of network evolution or not. 

The model builds on a recent kind of randomly rein- 
forced urn (RRU) processes |9l4l3| so that the probabil- 
ity of picking an edge (color) depends on its weight. At 
each time-step (the time is beaten by the drawings) the 
picked edge (color) brings a random weight (number of 
added balls) and at the next time step the probability 
of picking a certain edge (color) is proportional, not sim- 
ply to the number of drawings of that edge (color), but 
to the total weight already allocated to that edge (total 
number of added balls of that color): a sort of weighted 
preferential attachment. 

If we consider a network with A*" vertices and L edges 
(directed or not, we typically consider complete graphs). 



then this dynamics defines a weighted adjacency matrix 
W„ for every time-step u, where the generic element Wuij 
is the total weight on the edge i,j until time-step u (i.e. 
the total number of added balls of color z,j until time- 
step u). Hereafter we indicate the various edges by the 
index £ (with £ G [1, L]). Similarly we define a matrix K„ 
whose elements fc„^ = [K„]£ represents the total number 
of drawings of edge £ until time-step u. 

More specifically, the dynamics of the network is the 
following. We start at time w = 1, by picking an edge 
£* = i*,j* according to following rule: every edge £ can 
be picked with an initial probability Zqi = agj 'Ylii=\ '^t.i 
where the parameters a^ are strictly positive. (The actual 
value of these parameters plays no role in the asymptotic 
results and the statistical tools we will present in the 
sequel). After that a random weight W\i* > is added 
to the chosen edge £* . We do not pay particular attention 
to the specific form of these weights, provided that the 
weights are independent positive random variables, which 
arc uniformly bounded by a constant. We finally pick a 
new edge according to the probability distribution given 
by 
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where Xn^ = 1 if at the nth time-step the edge £ was 
chosen and it is defined equal to zero otherwise. In other 
words wc define (akin to the preferential attachment idea) 
a probability of edge-extraction that takes into account 
the previous growth of the network. We can write 



^nl 



n=\ 
u 

[W„], = ^IF„,A„, 



(2) 



Our model is related to weighted-network modeling, 
since it is described, not only by binary adjacency ma- 
trices, but also by the sequence (K„), which counts the 
number of times each edge is picked, and the sequence 
(Wtj), which records the total weight of each edge. 

Given a subset V of the L edges, we suppose that, for 
every time-step m. 
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and Var[Wue] ~ crj £ (0, +00). If the set V coincides 
with the L edges, the above conditions imply that the 
weights have the same mean value for all edges. Con- 
versely, when the number of elements in the set V is 
lower than L the weights associated to the edges in V 
"dominate in mean" on those associated to the others. 
(Note that a typical case of the first type holds when ev- 
ery weight Wue is equal to 1, i.e. the classical preferential 
attachment.) Our analysis covers both these cases. 

As u — > +00, the probability Z^e of choosing the edge £ 
converges almost surely (a.s.) to zero when £ ^ V; while 
it converges a.s. to a random variable Zi* with values in 
]0,1] a.s. when £ = £* e V and E^^eP^r =1 [13, El- 
Therefore the notion of "dominant edges" could provide 
a formalization of the empirical evidence that many real 
networks are rather sparse. This means that with re- 
spect to all the possible edges, a club of edges collects 
the mayor fraction of the total weight of the network. 
More precisely, it has been proved that, as the number 
of time-steps u grows, the total weight on the dominant 
edges grows according to 

u u 

while the same limit for the dominated edges is zero, i.e. 

u u 



0. (5) 



Moreover, for a dominant edge £* , the total weight asso- 
ciated to that edge normalized by the total weight of the 
network (assumed to be non zero) converges a.s. to the 
previous random variable Zg* according to 
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(6) 
and the number of extractions of £* divided by the total 
number of extractions converges a.s. to the same random 
variable, that is 
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The corresponding limits for dominated edges are both 
equal to zero. In particular, we have u^~^Zue ^— > for 
£ ^ T) and each A £ (A, 1) where A = max^^p fi(/fi*. 
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Figure 1: (color online) We performed some numerical sim- 
ulations of the model (with L = 2500) by preassigning both 
no (1 class) and one dominant set (2 classes). On the left 
we plot the frequency distribution of the weights in the net- 
work, for uniform and truncated Gaussian (G) choice of the 
distribution of W (for a comparison we plot also the weights 
distribution of the GOMTRADE data). On the right we plot 
the histogram of the number of drawings of each edge with no 
dominant set (up) and with the set [1, 1225] as the dominant 
set (below). 



Based on the above limit relations and some asymp- 
totic results, analytically proved in [lOl, lll|, we have de- 
veloped a statistical test for the class V. In particular, 
we can test the hypothesis of a given subset becoming 
the class of dominant edges during the evolution of the 
network. Similarly, it is possibile to test if a particular 
instance of a given network has a weight distribution that 
already evolved into its stationary state or not. 

We assume as a null hypothesis that the "dominant 
set" T> coincides with a certain subset of edges T>* with 
card(I?*) > 2 and consider a certain level (1 — a) (typi- 
cally a = 5%, 10%). Then wc fix £* e V* and compare 
the quantity (defined in the sequel) 
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with the quantile q^ of the standard normal distribution 
A^(0, 1) of order (1 — a/2) (that is q^ is the number such 
that A/'(0, l){qa, +oo) = ^ ^-^^d g^ = 1-96 for a ~ 5% and 
Qq, = 1.645 for a ~ 10%). If the computed quantity is 
greater than qa , then we reject the null hypothesis at the 
(approximate) level (1 — a); otherwise we can not reject 
it. The random variable Uut* is defined as 



c'«£* = 7 _ n4 (y) 



where Xui = X]n=i Xni/u and /I* is an estimate of the 
mean value /i* and (t^^ is an estimate of the variance a'j , 
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Further the random variable C*». is defined as 
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Simulations have shown that, if we perform the above 
test taking V* exactly equal to the preassigned dominant 
set, then the percentage of indexes i* for which the test 
gives the rejection of the hypothesis is very low (= 2.28% 
for a = 10% and 0.82% for a = 5%). From now on we 
will call this percentage the "rejection percentage". If 
we consider a different V* with the same cardinality of 
the real dominant set, the rejection percentage increases 
(even if we change a single element): the more V* and 
the real dominat set are different, the higher the rejection 
percentage is (we got values up to 93% for a = 10% 
and 85% for a — 5%). However, we observed that the 
power of this test decreases with the decreasing of the 
cardinality of I?*. For instance, it is not able to reject 
the null hypothesis when V* is strictly contained in the 
real dominant set. As a solution to this problem, we add 
to the previous test another statistical test obtained by 
replacing the random variable Uut" by 
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This second test works very well for 2?* with small cardi- 
nality (the rejection percentage goes from 80% to 100%.) 

In sum, based on these two statistical tests, we have 
introduced a statistical procedure to study the dominant 
set of a network and predict if a certain edge distribution 
will disappear in the steady state of the graph evolution 
or not. 

As an application and a test, we consider the inter- 
national trade network (ITN), also known in complex 
network literature as the world-trade web [IJ]. ITN is 
defined as the network of import-export relationships be- 
tween world countries in a given period (usually a year) . 
Many efforts have been devoted to analyze the struc- 
ture and the dynamics of the ITN from an empirical and 
theoretical modeling perspective (see, for instance, |15l - 
|22|. However, existing contributions are not able to rig- 
orously explain the evidence that there exists a "club of 
a few rich countries" [17[ that control a major share of 
the trade network. This issue of "rich-club" detection 



is particularly important also from a theoretical point of 
view. Rich club property (i.e. the proportion of vertices 
whose degree is larger than a certain value that are also 
connected each other) can be defined in a proper way 
only for sparse networks [23[, while no consensus exists 
for the case of dense networks [2J| as ITN. In particular, 
for dense networks it is particularly difficult to define a 
reference or null case, against which one can measure 
the specific features of the real system. Our model al- 
lows a natural description of this case and it also allows 
for a rigorous analysis of the stability of the statistical 
distributions. In the context of the ITN, we assume that 
the nodes represent the countries and the edges represent 
the trade between them. With regard to the weights |25J. 
there are different possibilities. The most natural choice 
is to define the weight of a certain edge £ = i,j in terms 
of the value of the flow from i to j. 

As a real case data example we consider here the data 
of trades between nations in the years 1948-2000 as it is 
possible to reconstruct from COMTRADE data jH]. We 
computed for each year and for each couple of countries 
{A, B) the total exports (when present) from A to B. 
The ordered couple (A, B) is an edge (color) while the 
edge weight for a certain year represents an extraction of 
that edge (color) where the number of added balls is the 
amount of dollars for the total exports for that edge in 
that year. For the COMTRADE data we don't know in 
advance the "dominant edges" set but we can leverage 
from the statistical test previously defined to extract at 
least a core subset of it. In order to get this core subset 
we fixed V* of size 2000 and performed the first test for 
T)* picking up (* in descending order starting from the 
largest edge weight. If we then plot the number of no- 
rejections along the whole set of f in V* , we find that 
for the ordered case the number of no-rejections grows 
linearly with constant slope close to 1 but at a certain 
point starts bending (see Fig. [2]). After this bending it 
saturates and reaches a plateau where the £* will always 
give a rejection. Remarkably we found an "optimal" size 
of V* for which the difference of the rejection percentage 
for the ordered edges and the random case is maximal, 
revealing that the set of top ranking edges in that subset 
is the best fit for the "dominant edges" set. 

In summary, we present here a model of weighted- 
network growth based on a weighted preferential attach- 
ment principle |27| : the probability of picking an edge 
depends on the total weight of that edge (and not sim- 
ply on the number of times it has been picked) [28|, l29| . 
We provide a theoretical framework, which accounts for 
the empirical evidence that many real networks grow in a 
heterogeneous way generating a subset of dominant edges 
that controls a major share of the total weight of the 
network, while the weight of other connections is negli- 
gible. Our approach is quite general and fiexible since it 
does not require a particular probability distribution or 
functional form of the weights. Furthermore our model 
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Figure 2: (color online) In the lower panel we checked the 
number of no-rejections for the COMTRADE data and the 
simulated data of an urn with colored balls in the case of 
uniform distributions. For both cases we considered a T)* 
of size 2000 and ordered the edges/colors in descending or- 
der according to the edge weight/number of balls values. We 
then executed the test considering (* running from the high- 
est to the lowest value and accumulating the number of no- 
rejections in the y-axis. After a constant no-rejection rate 
the COMTRADE data (blue line) start bending, signaling 
the presence of a core subset of dominant edges. The same 
happens for the urn with colored balls (red line) but with a 
much more sharp turning point, exactly in correspondence of 
the dominant T)* size of 1225, known a priori. In the inset the 
same procedure has been performed for a random T>* for the 
two corresponding cases. In the upper panel, we calculated 
the difference between the rejection percentage for the ordered 
and the random case and discovered a maximum where the 
two curves start bending. 



produces in a natural way dense benchmark networks 
that can be used as a reference or benchmark towards 
real dense networks. The mapping with RRU has al- 
lowed us to introduce a statistical procedure for making 
inference on the class of dominant links. Thanks to the 
above procedure, it is now possible to quantitatively test 
the convergence to steady state in network dynamics, a 
problem often encountered in assessing the significance 
of observations in complex networks. 
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